GP-Pi: Using Genetic Programming with Penalization and Initialization on Genome-Wide Association Study
نویسندگان
چکیده
The advancement of chip-based technology has enabled the measurement of millions of DNA sequence variations across the human genome. Experiments revealed that high-order, but not individual, interactions of single nucleotide polymorphisms (SNPs) are responsible for complex diseases such as cancer. The challenge of genome-wide association studies (GWASs) is to sift through high-dimensional datasets to find out particular combinations of SNPs that are predictive of these diseases. Genetic Programming (GP) has been widely applied in GWASs. It serves two purposes: attribute selection and/or discriminative modeling. One advantage of discriminative modeling over attribute selection lies in interpretability. However, existing discriminative modeling algorithms do not scale up well with the increase in the SNP dimension. Here, we have developed GP-Pi. We have introduced a penalizing term in the fitness function to penalize trees with common SNPs and an initializer which utilizes expert knowledge to seed the population with good attributes. Experimental results on simulated data suggested that GP-Pi outperforms GPAS with statistically significance. GP-Pi was further evaluated on a real GWAS dataset of Rheumatoid Arthritis, obtained from the North American Rheumatoid Arthritis Consortium. Our results, with potential new discoveries, are found to be consistent with literature.
منابع مشابه
Unveiling the genetic loci for a panicle developmental trait using genome-wide association study in rice
Panicle size has a high correlation with grain yield in rice. There is a bottleneck to identify the additional quantitative trait loci (QTL) for panicle size due to the conventional traits used for QTL mapping. To identify more genetic loci for panicle size, a panicle developmental trait (LNTB, the length from panicle neck-knot to the first primary branch in the rachis) related to panicle size ...
متن کاملEstimation of Discharge over the Submerged Compound Sharp-Crested Weir using Artificial Neural Networks and Genetic Programming
Truncated sharp crested weirs are used to measure flow rate and control upstream water surface in irrigation canals and laboratory flumes. The main advantages of such weirs are ease of construction and capability of measuring a wide range of flows with sufficient accuracy. Artificial neural networks (ANNs) and genetic programming (GP) have recently been used for estimation of hydraulic data. In...
متن کاملGenome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis
Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...
متن کاملGenome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review
Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...
متن کاملRELATIONSHIP OF TENSILE STRENGTH OF STEEL FIBER REINFORCED CONCRETE BASED ON GENETIC PROGRAMMING
Estimating mechanical properties of concrete before designing reinforced concrete structures is among the design requirements. Steel fibers have a considerable effect on the mechanical properties of reinforced concrete, particularly its tensile strength. So far, numerous studies have been done to estimate the relationship between tensile strength of steel fiber reinforced concrete (SFRC) and ot...
متن کامل